-
Notifications
You must be signed in to change notification settings - Fork 4.4k
[WIP] Demonstration provider #4988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
|
||
@timed | ||
def load_demonstration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was demo_loader.load_demonstration
return trajectories | ||
|
||
@staticmethod | ||
def _get_demo_files(path: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from demo_loader.get_demo_files
def get_behavior_spec(self) -> BehaviorSpec: | ||
return self._behavior_spec | ||
|
||
def pop_trajectories(self) -> List[DemonstrationTrajectory]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add docstrings here. But the idea is that GAIL, etc could be converted to use pop_trajectories() directly. Then if we want DemonstrationProviders to be able to load new demonstrations on the fly, the logic can be kept in the DemonstrationProvider and the consumer doesn't need to know about it, it just gets a fresh batch of trajectories.
from mlagents.trainers.trajectory import ObsUtil | ||
|
||
|
||
class DemonstrationExperience(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are trimmed down versions of AgentExperience and Trajectory classes, based on what's currently in demo_loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I feel like we shouldn't duplicate the conversion code here between AgentExperience and Trajectory (esp. with the teammate observations coming in, it becomes quite a fat function - and at some point I imagine we'll have teammate demonstrations as well).
Wonder if we can have a BaseAgentExperience be the base class that is used here and in the AgentProcessor, and have the AgentExperience (PolicyAgentExperience?) inherit from it? Or some other way of composing these two.
Shelving this for now, since research won't be able to make use of it for a while, and we're still not sure on the trajectory/experience handling. |
Proposed change(s)
Very rough WIP to convert GAIL and BC to use a new DemonstrationProvider interface.
The eventual goal (beyond the scope of this PR) is to let users define their own DemonstrationProvider interface in a plugin and use that instead of the given LocalDemonstrationProvider.
In this PR:
TODO
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
https://jira.unity3d.com/browse/MLA-1734
Types of change(s)
Checklist
Other comments